NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cyclic peptide structure prediction and design using AlphaFold2

https://doi.org/10.1038/s41467-025-59940-7

Rettie, Stephen A; Campbell, Katelyn V; Bera, Asim K; Kang, Alex; Kozlov, Simon; Bueso, Yensi Flores; De_La_Cruz, Joshmyn; Ahlrichs, Maggie; Cheng, Suna; Gerben, Stacey R; et al (December 2025, Nature Communications)

Free, publicly-accessible full text available December 1, 2026
Protein language models learn evolutionary statistics of interacting sequence motifs

https://doi.org/10.1073/pnas.2406285121

Zhang, Zhidian; Wayment-Steele, Hannah K; Brixi, Garyk; Wang, Haobo; Kern, Dorothee; Ovchinnikov, Sergey (November 2024, Proceedings of the National Academy of Sciences)

Protein language models (pLMs) have emerged as potent tools for predicting and designing protein structure and function, and the degree to which these models fundamentally understand the inherent biophysics of protein structure stands as an open question. Motivated by a finding that pLM-based structure predictors erroneously predict nonphysical structures for protein isoforms, we investigated the nature of sequence context needed for contact predictions in the pLM Evolutionary Scale Modeling (ESM-2). We demonstrate by use of a “categorical Jacobian” calculation that ESM-2 stores statistics of coevolving residues, analogously to simpler modeling approaches like Markov Random Fields and Multivariate Gaussian models. We further investigated how ESM-2 “stores” information needed to predict contacts by comparing sequence masking strategies, and found that providing local windows of sequence information allowed ESM-2 to best recover predicted contacts. This suggests that pLMs predict contacts by storing motifs of pairwise contacts. Our investigation highlights the limitations of current pLMs and underscores the importance of understanding the underlying mechanisms of these models.
more » « less
Full Text Available
One-shot design of functional protein binders with BindCraft

https://doi.org/10.1038/s41586-025-09429-6

Pacesa, Martin; Nickel, Lennart; Schellhaas, Christian; Schmidt, Joseph; Pyatova, Ekaterina; Kissling, Lucas; Barendse, Patrick; Choudhury, Jagrity; Kapoor, Srajan; Alcaraz-Serna, Ana; et al (August 2025, Nature)

Free, publicly-accessible full text available August 27, 2026
Scalable protein design using optimization in a relaxed sequence space

https://doi.org/10.1126/science.adq1741

Frank, Christopher; Khoshouei, Ali; Fuβ, Lara; Schiwietz, Dominik; Putz, Dominik; Weber, Lara; Zhao, Zhixuan; Hattori, Motoyuki; Feng, Shihao; de_Stigter, Yosta; et al (October 2024, Science)

Machine learning (ML)–based design approaches have advanced the field of de novo protein design, with diffusion-based generative methods increasingly dominating protein design pipelines. Here, we report a “hallucination”-based protein design approach that functions in relaxed sequence space, enabling the efficient design of high-quality protein backbones over multiple scales and with broad scope of application without the need for any form of retraining. We experimentally produced and characterized more than 100 proteins. Three high-resolution crystal structures and two cryo–electron microscopy density maps of designed single-chain proteins comprising up to 1000 amino acids validate the accuracy of the method. Our pipeline can also be used to design synthetic protein-protein interactions, as validated experimentally by a set of protein heterodimers. Relaxed sequence optimization offers attractive performance with respect to designability, scope of applicability for different design problems, and scalability across protein sizes.
more » « less
Full Text Available
Genomic language model predicts protein co-regulation and function

https://doi.org/10.1038/s41467-024-46947-9

Hwang, Yunha; Cornman, Andre L.; Kellogg, Elizabeth H.; Ovchinnikov, Sergey; Girguis, Peter R. (April 2024, Nature Communications)

<bold>Abstract</bold> Deciphering the relationship between a gene and its genomic context is fundamental to understanding and engineering biological systems. Machine learning has shown promise in learning latent relationships underlying the sequence-structure-function paradigm from massive protein sequence datasets. However, to date, limited attempts have been made in extending this continuum to include higher order genomic context information. Evolutionary processes dictate the specificity of genomic contexts in which a gene is found across phylogenetic distances, and these emergent genomic patterns can be leveraged to uncover functional relationships between gene products. Here, we train a genomic language model (gLM) on millions of metagenomic scaffolds to learn the latent functional and regulatory relationships between genes. gLM learns contextualized protein embeddings that capture the genomic context as well as the protein sequence itself, and encode biologically meaningful and functionally relevant information (e.g. enzymatic function, taxonomy). Our analysis of the attention patterns demonstrates that gLM is learning co-regulated functional modules (i.e. operons). Our findings illustrate that gLM’s unsupervised deep learning of the metagenomic corpus is an effective and promising approach to encode functional semantics and regulatory syntax of genes in their genomic contexts and uncover complex relationships between genes in a genomic region.
more » « less
Easy and accurate protein structure prediction using ColabFold

https://doi.org/10.21203/rs.3.pex-2490/v1

Kim, Gyuri; Lee, Sewon; Karin, Eli Levy; Kim, Hyunbin; Moriwaki, Yoshitaka; Ovchinnikov, Sergey; Steinegger, Martin; Mirdita, Milot (December 2023, Research Square)

Abstract Since its public release in 2021, AlphaFold2 (AF2) has made investigating biological questions, using predicted protein structures of single monomers or full complexes, a common practice. ColabFold-AF2 is an open-source Jupyter Notebook inside Google Colaboratory and a command-line tool, which makes it easy to use AF2, while exposing its advanced options. ColabFold-AF2 shortens turn-around times of experiments due to its optimized usage of AF2’s models. In this protocol, we guide the reader through ColabFold best-practices using three scenarios: (1) monomer prediction, (2) complex prediction, and (3) conformation sampling. The first two scenarios cover classic static structure prediction and are demonstrated on the human glycosylphosphatidylinositol transamidase (GPIT) protein. The third scenario demonstrates an alternative use-case of the AF2 models by predicting two conformations of the human Alanine Serine Transporter 2 (ASCT2). Users can run the protocol without command-line knowledge via Google Colaboratory or in a command-line environment. The protocol is available at https://protocol.colabfold.com.
more » « less
Full Text Available
Computational design of soluble and functional membrane protein analogues

https://doi.org/10.1038/s41586-024-07601-y

Goverde, Casper A; Pacesa, Martin; Goldbach, Nicolas; Dornfeld, Lars J; Balbi, Petra_E M; Georgeon, Sandrine; Rosset, Stéphane; Kapoor, Srajan; Choudhury, Jagrity; Dauparas, Justas; et al (July 2024, Nature)

Abstract De novo design of complex protein folds using solely computational means remains a substantial challenge¹. Here we use a robust deep learning pipeline to design complex folds and soluble analogues of integral membrane proteins. Unique membrane topologies, such as those from G-protein-coupled receptors², are not found in the soluble proteome, and we demonstrate that their structural features can be recapitulated in solution. Biophysical analyses demonstrate the high thermal stability of the designs, and experimental structures show remarkable design accuracy. The soluble analogues were functionalized with native structural motifs, as a proof of concept for bringing membrane protein functions to the soluble proteome, potentially enabling new approaches in drug discovery. In summary, we have designed complex protein topologies and enriched them with functionalities from membrane proteins, with high experimental success rates, leading to a de facto expansion of the functional soluble fold space.
more » « less
Full Text Available
State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold

https://doi.org/10.1103/PhysRevLett.129.238101

Roney, James P.; Ovchinnikov, Sergey (November 2022, Physical Review Letters)

Full Text Available
ColabFold: making protein folding accessible to all

https://doi.org/10.1038/s41592-022-01488-1

Mirdita, Milot; Schütze, Konstantin; Moriwaki, Yoshitaka; Heo, Lim; Ovchinnikov, Sergey; Steinegger, Martin (June 2022, Nature Methods)

Abstract ColabFold offers accelerated prediction of protein structures and complexes by combining the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold’s 40−60-fold faster search and optimized model utilization enables prediction of close to 1,000 structures per day on a server with one graphics processing unit. Coupled with Google Colaboratory, ColabFold becomes a free and accessible platform for protein folding. ColabFold is open-source software available at https://github.com/sokrypton/ColabFold and its novel environmental databases are available at https://colabfold.mmseqs.com .
more » « less
Full Text Available
Structure-based protein design with deep learning

https://doi.org/10.1016/j.cbpa.2021.08.004

Ovchinnikov, Sergey; Huang, Po-Ssu (December 2021, Current Opinion in Chemical Biology)

Full Text Available

« Prev Next »

Search for: All records